Skip to content

Conversation

@jpinsonneau
Copy link
Contributor

@jpinsonneau jpinsonneau commented Nov 3, 2025

Performance improvements

1. Lock-free flow updates (bpf/flows.c, bpf/types.h)

  • Removed bpf_spin_lock from the flow_metrics structure.
  • Replaced with atomic operations:
    • __sync_fetch_and_add() for packets and bytes.
    • Direct writes for other fields (idempotent or acceptable occasional races).

Why: Reduces lock contention in the hot path; updates are safe with atomics.

2. Loop unrolling optimizations

add_observed_intf() function (bpf/flows.c)

  • Unrolled the loop for up to 6 interfaces.
  • Direct index comparisons instead of a loop.
  • Early exits for common cases (0–3 interfaces).

Why: Removes loop overhead; most flows see 1–2 interfaces.

md_already_exists() function (bpf/network_events_monitoring.h)

  • Unrolled the loop for the 4-element array.
  • Direct comparisons for all positions.

Why: Eliminates loop overhead in network event checking.

3. Early IP filtering (bpf/flows_filter.h, bpf/flows.c, bpf/utils.h)

  • Added early_ip_filter_check() for IP-only rejection without L4 parsing.
  • Split parsing into:
    • fill_ethhdr_l3only() — parses L2+L3 only
    • parse_l4_after_l3() — parses L4 separately
    • fill_iphdr_l3only() / fill_ip6hdr_l3only() — L3-only variants

Why: Skips L4 parsing when IP-based filtering can reject packets early, reducing work.

4. Memory initialization optimizations (bpf/flows.c)

  • Replaced __builtin_memset() with explicit field initialization.
  • Uses designated initializers (flow_metrics new_flow = { ... }) and selective initialization.
  • Initialize only necessary fields; compiler handles the rest.

Why: Avoids unnecessary zeroing; compiler can optimize better.

5. Generated Go code updates

  • Updated pkg/ebpf/bpf_*_bpfel.go (all architectures) to remove the Lock field.
  • Updated pkg/model/record_test.go to reflect the removed lock field in binary encoding tests.

Overall impact

These changes target high-frequency paths:

  1. Lock-free updates — reduces contention.
  2. Loop unrolling — removes loop overhead.
  3. Early filtering — skips unnecessary L4 parsing.
  4. Better initialization — fewer unnecessary memory operations.

Together, these reduce CPU cycles per packet, which should improve throughput in a high-traffic eBPF flow monitoring agent.

Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • Will this change affect NetObserv / Network Observability operator? If not, you can ignore the rest of this checklist.
  • Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • Does this PR require product documentation?
    • If so, make sure the JIRA epic is labelled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • Does this PR require a product release notes entry?
    • If so, fill in "Release Note Text" in the JIRA.
  • Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
    • If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
    • Standard QE validation, with pre-merge tests unless stated otherwise.
    • Regression tests only (e.g. refactoring with no user-facing change).
    • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

To run a perfscale test, comment with: /test ebpf-node-density-heavy-25nodes

@openshift-ci
Copy link

openshift-ci bot commented Nov 3, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci
Copy link

openshift-ci bot commented Nov 3, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign mariomac for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jpinsonneau
Copy link
Contributor Author

/test ebpf-node-density-heavy-25nodes

@jpinsonneau
Copy link
Contributor Author

jpinsonneau commented Nov 4, 2025

Add a python script to compare perfs: 53d49f8

ebpf_performance_visualization

I was expecting better performances improvments here but it still handle more flows and the ratio is showing improvments in terms of memory.

WDYT @jotak ?
cc @msherif1234 you way me interested too 😸

@codecov
Copy link

codecov bot commented Nov 4, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 30.00%. Comparing base (e9ebab7) to head (dda38e5).
⚠️ Report is 5 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #824      +/-   ##
==========================================
+ Coverage   29.74%   30.00%   +0.25%     
==========================================
  Files          49       49              
  Lines        5355     4519     -836     
==========================================
- Hits         1593     1356     -237     
+ Misses       3645     3046     -599     
  Partials      117      117              
Flag Coverage Δ
unittests 30.00% <ø> (+0.25%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
pkg/ebpf/bpf_x86_bpfel.go 0.00% <ø> (ø)

... and 47 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jpinsonneau
Copy link
Contributor Author

Comparing to last 3 runs shows better results: dda38e5

ebpf_performance_visualization

// Interface already seen -> skip
return 0;

// Fast path: unroll loop for small array sizes (most common cases)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we must measure that to see how much it improves CPU. The downside I see is that the code is less intuitive / readable, and also it's error prone if we decide to increase MAX_OBSERVED_INTERFACES (we'd need to add new "unrolled" blocks, which can easily be missed)
But optimizations often come with tradeoff so that might be ok, depending on the measured improvement

Comment on lines +277 to +289
flow_metrics new_flow = {
.if_index_first_seen = skb->ifindex,
.direction_first_seen = direction,
.packets = 1,
.bytes = len,
.eth_protocol = eth_protocol,
.start_mono_time_ts = pkt.current_ts,
.end_mono_time_ts = pkt.current_ts,
.flags = pkt.flags,
.dscp = pkt.dscp,
.sampling = flow_sampling,
.nb_observed_intf = 0 // Explicitly zero for clarity
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we used to do that previously, and switched to individual assignments, iirc @msherif1234 found cases where that didn't work as intended, but can't remember what exactly. @msherif1234 do you remember?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants